System and method for pre-processing information used by an automated attendant

ABSTRACT

The invention concerns method and system for pre-processing entries in a directory listings. An automated attendant or automated directory listings assistant may use the pre-processed entries. A first directory listings including one or more fields may be received. The one or more fields may be populated with entries including one or more symbol strings. A second directory listings including one or more fields may be received. The one or more fields of the second directory listings may be populated with entries including one or symbol strings. Entries in the one or more fields of the first directory listings may be correlated with entries in the corresponding one or more fields of the second directory listings. Entries, in the one or more fields of the first directory listings, which do not correlate with entries in the corresponding one or more fields of the second directory listings may be identified. The identified entries may be processed using a rule set corresponding to the field in which the entry is located. Based on the rule set, a corresponding confidence level for the processed entries may be determined. The processed entries having the corresponding confidence level meeting or exceeding a threshold may be automatically modified. The automatically modified entries may be outputted for processing. In alternative embodiments of the present invention, the processed entries having the corresponding confidence level below the threshold may be marked for operator confirmation.

CROSS-REFERENCED TO RELATED PATENT APPLICATION

[0001] This patent application is a continuation application of andclaims priority to U.S. patent application Ser. No. 10/041,620 filedJan. 10, 2002, which claims benefit of U.S. Provisional PatentApplication Serial No. 60/300,867 filed Jun. 27, 2001.

TECHNICAL FIELD

[0002] The present invention relates to automatic directory assistance.In particular, the present invention relates to systems and methods forautomatically pre-processing entries contained in an informationaldatabase used by an automated attendant.

BACKGROUND OF THE INVENTION

[0003] In recent years, automated attendants have become very popular.Many individuals or organizations use automated attendants toautomatically provide information to callers and/or to route incomingcalls. An example of an automated attendant is an automated directoryassistant that automatically provides a telephone number, address, etc.for a business or an individual in response to a user's request.

[0004] Typically, a user places a call and reaches an automateddirectory assistant (e.g. an Interactive Voice Recognition (IVR) system)that prompts the user for desired information and searches aninformational database (e.g., a white pages listings database) for therequested information. The user enters the request, for example, a nameof a business or individual via a keyboard, keypad or spoken inputs. Theautomated attendant searches for a match in the informational databasebased on the user's input and may output a voice synthesized result if amatch can be found.

[0005] When offering automated directory assistance, the informationaldatabase may be used for two purposes. One purpose may be to createvocabularies and grammars for the speech recognition engine thatrecognizes the caller's request and a search engine that searches for amatch. The other purpose may be to generate a speech-synthesized outputof the requested listing to the caller.

[0006] The information or listings contained in these informationaldatabases may contain abbreviations, acronyms, errors, or otherdeviations that may prevent the search engine from recognizing thelisting as well as the speech synthesizer from pronouncing the listingsso that it is understood by the caller. For example, the system may notbe able to recognize or pronounce the abbreviation “CLD HARBR SPRNG” tomean “Cold Harbor Springs.” In another example, the speech recognitionengine may not understand a caller's request if the caller uses theabbreviation “N-C-double A” to mean “N-C-A-A.”

[0007] Additionally, directory listings are typically optimized forvisual presentation, not for conversation. Thus, the word order is oftenreversed and acronyms are used extensively. Such deviations may furtherprevent the listing from being recognized. For example, the listing“Smith Joe S., MD” may not be recognized if the caller says “Doctor JoeS. Smith.”

[0008] Such deviations in the listings database and/or in the waycaller's may pronounce a requested listing may prevent the caller'srequest for information from being completed automatically or may delayits completion.

[0009] One approach to solving this problem involves having an operatorpersonally inspect each database entry individually and fine-tuning eachlisting. This conventional technique can be impractical when hundreds ofthousands and even millions of listings are not only involved, but mayalso be in a continual state of flux, as is the case with telephonedirectory listings. Additionally, errors, abbreviations, acronyms, etc.may require intervention of an operator, which can delay the process andprevents complete automation, which is desirable.

SUMMARY OF THE INVENTION

[0010] Embodiments of the present invention concern a method and systemfor pre-processing entries in directory listings. An automated attendantor automated directory listings assistant may use the pre-processedentries. A first directory listings including one or more fields may bereceived. The one or more fields may be populated with entries includingone or more symbol strings. A second directory listings including one ormore fields may be received. The one or more fields of the seconddirectory listings may be populated with entries including one or moresymbol strings. Entries in the one or more fields of the first directorylistings may be correlated with entries in the corresponding one or morefields of the second directory listings. Entries, in the one or morefields of the first directory listings, which do not correlate withentries in the corresponding one or more fields of the second directorylistings may be identified. The identified entries may be processedusing a rule set corresponding to the field in which the entry islocated. Based on the rule set, a corresponding confidence level for theprocessed entries may be determined. The processed entries having thecorresponding confidence level meeting or exceeding a threshold may beautomatically modified. The automatically modified entries may beoutputted for processing. In alternative embodiments of the presentinvention, the processed entries having the corresponding confidencelevel below the threshold may be marked for operator confirmation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] Embodiments of the present invention are illustrated by way ofexample, and not limitation, in the accompanying figures in which likereferences denote similar elements, and in which:

[0012]FIG. 1 is a block diagram of a directory listings pre-processingsystem in accordance with an embodiment of the present invention;

[0013]FIG. 2 illustrates a block diagram of a listings pre-processingdevice in accordance with an embodiment of the present invention;

[0014]FIG. 3 is block diagram of a graphical user interface inaccordance with an exemplary embodiment of the present invention; and

[0015]FIG. 4 is flowchart showing a listings pre-processing method inaccordance with an exemplary embodiment of the present invention.

DETAILED DESCRIPTION

[0016] Embodiments of the present invention relate to an automatedand/or semi-automated system that can pre-processes directory listingsor other information so that the information can be automaticallyrecognized and/or presented to a user. Embodiments of the presentinvention may utilize a series of pre-processing steps to, for example,correct typographical errors, expand abbreviations to be contextsensitive, correct order of words, expand acronyms, and/or specify howacronyms, proper names (people and places) and/or other informationshould be pronounced.

[0017] The listings pre-processing system, in accordance withembodiments of the present invention, may process listings entriesaccording to a rule set. For example, the system may generate apre-processed listings output and a corresponding confidence level foreach pre-processed listing. The confidence level may be generated basedon the rule set to indicate the level of certainty with which thelisting was corrected or preprocessed. If, for example, a processedlisting has a corresponding confidence level above or at a predeterminedthreshold, the listing may be sent directly to an automated attendantfor immediate use in speech recognition and/or speech synthesis.Optionally and/or additionally, such high confidence outputs may be sentto a storage device for use at a later time and/or to any other device.

[0018] Alternatively, in embodiments of the present invention, if aprocessed listing has a corresponding confidence level below apredetermined threshold, the processed listing may be sent immediatelyto, for example, an operator for confirmation and/or correction.Optionally and/or additionally, such low confidence outputs may be sentto a storage device for use at a later time and/or to any other device.

[0019] Embodiments of the present invention may include a graphical userinterface (GUI) for presenting, to the operator, the low confidence orquestionable listings together with, for example, suggested possiblecorrections for selection by the operator. Using the GUI, the operatormay modify the questionable listings based on one or more rules includedin the pre-determined rule set or, alternatively, the operator maymodify the questionable listing based on the operator's personaldiscretion. In embodiments of the present invention, the operator maycreate additional rules that may be used to pre-process the listings.These additional rules, created by the operator, may be included in thepredetermined rule set to pre-process the listings in accordance withembodiments of the present invention.

[0020]FIG. 1 is a block diagram of a directory listings pre-processingsystem 100 according to an exemplary embodiment of the presentinvention. The directory listings pre-processing system 100 may includea listings preprocessing device (LPPD) 120 that may operate inaccordance with embodiments of the present invention.

[0021] In embodiments of the present invention, the LPPD 120 may receiveinformation entries from an informational database 110. For example, theinformational database 110 may be a white pages listings database thatmay include a plurality of fields including one or more informationentries. The plurality of fields may include names of individuals and/orbusinesses, corresponding street addresses, township, city, state and/orcountry names, zip codes, telephone numbers, e-mail addresses, web siteaddresses, and/or any other information relating to the individualsand/or businesses. It is recognized that the database 110 may includeany type of information that may be used by automated attendants toprovide a variety of products and/or services to users. It is alsorecognized that embodiments of the present invention may be used topre-process any type of information to correct errors, expandabbreviation, add abbreviations, expand acronyms, add acronyms, etc.

[0022] In embodiments of the present invention, entries in the variousdatabases, referred to or described herein, may include one or moresymbol strings. Symbol strings as used herein may be text or characterstrings that represent individual or business listings and/or otherinformation.

[0023] Although FIG. 1 shows the informational database 110 as a singledatabase, it is recognized that the database 110 may be a plurality ofdifferent databases where each database may contain specific type ofinformation. For example, one type of the informational database 110 maycontain only individual and/or business names, while another type maycontain only addresses, while yet another type may contain names andcorresponding phone numbers and/or corresponding township names, etc.

[0024] The database 110 may be a typical information repository such aswhite pages listings database used by automated directory assistants tosearch for and provide information to callers. Typically, the database110 may contain at least some entries that may contain errors or otherdeviations that may prevent the entry from being recognizedautomatically by, for example, a speech recognizer and/or pronounced bya speech synthesizer. For example, the database 110 may contain entries,in one or more fields, that contain spelling errors, typographicalerrors, acronyms, abbreviations, improper or varying pronunciation,improper or varying word order and/or other informalities that mayprevent entries from being speech recognizer and/or pronounced by aspeech synthesizer.

[0025] In embodiments of the present invention, LPPD 120 may receiveand/or retrieve informational entries from the database 110 and maypre-process the entries based on one or more pre-determined rule sets,in accordance with embodiments of the present invention (to describedbelow in more detail). Pre-processing the entries of database 110, inaccordance with embodiments of the present invention, may reduce thedelays and/or in-efficiencies that may otherwise be encountered by, forexample, an automated directory assistant when searching for a user'srequest.

[0026] In embodiments of the present invention, after the LPPD 120pre-processes the entries from database 110, the pre-processed entriesmay be forwarded to, for example, the automated attendant 190 forstorage and/or immediate use.

[0027] In embodiments of the present invention, the pre-processedentries may be stored in the pre-processed listings database 132 locatedin, for example, the speech recognition system 130 of automatedattendant 190. The grammar generator 134 may generate one or moregrammars using the pre-processed entries stored in pre-processedlistings database 132. The grammar generator 134 may be any type ofknown hardware and/or software device for generating grammars. Thegenerated grammars may be stored in the vocabulary/grammars database136. The automated attendant 190 may utilize the grammars generatedbased on the pre-processed listings to search for the user's request forinformation.

[0028] In accordance with embodiments of the present invention, theautomated attendant 190 may further utilize the pre-processed entriesreceived from LPPD 120 to generate a spoken output for the requestedinformation using speech synthesizer 140. The pre-processed entries maybe stored in pronunciation dictionary 142 and forwarded to the speechsynthesis device 144. The speech synthesis device 144 may be any type ofspeech synthesizer known in the art. The pronunciation dictionary 142may include at least one pronunciation of each word of the pre-processedentries received from the LPPD 120. The speech synthesis device 144 maygenerate sound files based on the pre-processed listings received fromPD 120 and store the generated sound files in sound files database 146.The generated sound files from database 146 may be output to the user byautomated attendant 190 to complete the user's request for information.

[0029] The automated attendant 190 may include other components and/ordevices that are not shown for simplicity. The automated attendant 190may engage in further dialog with the user to provide additionalinformation, and/or to conduct additional searches in the event the useris not satisfied by the results provided by the automated attendant 190.Additionally, the automated attendant may provide the user with otherservices such as initiating a call on the user's behalf based on thesearched information and/or other known automated services.

[0030]FIG. 2 is a block diagram of the LPPD 120 in accordance with anembodiment of the present invention. The LPPD 120 may include apre-processor 220, a reference database 270, a rules database 211, anon-confirmed listings database 240 and a confirmed pre-processedlistings database 250. It is recognized that any suitable hardwareand/or software may be used by one of ordinary skill in the art toconfigure and/or implement the LPPD 120 in accordance with embodimentsof the present invention.

[0031] In embodiments of the present invention, the pre-processor 220may include, for example, a word order normalizer 221, a street nameexpander 223, and/or a township corrector 225. The pre-processor 220 mayinclude additional components such as a spelling checker, abbreviationexpander, acronym detector, pronunciation generator, grammar checker,and/or corrector, etc. (not shown).

[0032] In embodiments of the present invention, the plurality ofdatabases (e.g., databases 270, 211, 240, 250, etc.) shown can be storedin a memory device that may be located internal to and/or external tothe LPPD 120.

[0033] In embodiments of the present invention, LPPD 120 may receive,for example, a white pages listings from informational database 110 forpre-processing. The white pages listings from database 110 may contain aplurality of fields that contain a plurality of entries. The white pageslistings database 110 may include such fields as individual and/orbusiness names, corresponding street addresses, townships, zip codes,etc. It is recognized that the white pages listings database 110 mayinclude additional fields containing, for example, e-mail addresses, webpage addresses, phone numbers, etc.

[0034] In embodiments of the present invention, the listingspre-processing device 120 receives the plurality of entries from, forexample, the white pages listings database 110 and may pre-process theentries according to one or more rules included in the rules database211. The pre-processed entries may be forwarded to, for example, anautomated attendant or to an operator. The listings may be pre-processedperiodically or may be pre-processed as desired by, for example, anoperator.

[0035] In embodiments of the present invention, the word ordernormalizer 221 may correct the order of names included in the “Names”field of listings database 110 based on corresponding rules in the rulesdatabase 211. The normalizer 221 may recognize that the names field fromthe plurality of fields included in the database 110 using, for example,clues in the corresponding entries to identify that the listingcorresponds to a person's name. For example, the normalizer 221 may lookfor titles such as doctor, MD, accountant, Esq., etc. appearing in theentry to identify that the listing represents an individual's name.After the field is recognized, the normalizer 221 may verify andcorrect, if necessary, the order of the names in the correspondingfield.

[0036] In embodiments of the present invention, the normalizer 221 maycorrelate the first and the last names as appearing in the each entry ofthe listings database 110 to corresponding entries in the referencedatabase 270. The normalizer 221 may identify entries in the database110 that correspond to a name and title of an individual. The referencedatabase 270 may be a pre-verified database that may contain, forexample, a list of the top N (e.g., 10000) frequent first names, and topN most frequent last names. The normalizer 221 then may correlate eachword in the listing to the reference database 270, and determine whichis likely to be a given name and which is the family name, and changethe order of the words accordingly. In alternative embodiments of thepresent invention, the reference database 270 may be, for example, apre-verified database that is used by, for example, a postal service. Inthis case, the reference database 270 may contain names, street names,and full addresses, etc. of individuals and/or businesses in aparticular community, town, city, state, and/or country. It is alsorecognized that reference database 270 can be any type of databasecontaining verified entries that can be used to verify entries includedin any other type of database.

[0037] In embodiments of the present invention, after the normalizer 221identifies entries in the database 110 that do not correlate withcorresponding entries in the reference entries, the normalizer 221 mayprocess those entries in accordance with the corresponding rule in therules database 211. The order normalizer 221 may identify, based on thecorrelation with the reference database 270, entries in the listingsdatabase 110 that have, for example, inverted or otherwise errantentries.

[0038] For example, during a pre-processing step, normalizer 221 mayreceive an entry such as “Smith, John M.D.” specified in the namesfield. The normalizer 221 may confirm that the entry belongs in thenames field based on, for example, the title “M.D.” included in theentry. Based on a rule set for the word order normalizer 221 containedin the rule set database 211, the normalizer 221 may compare the entries“Smith” and “John” with entries contained in the given and family namesfields of the reference database 270.

[0039] In embodiments of the present invention, the reference database270 may be, for example, a list of the top N (e.g., 10000) frequentfirst names, and top N most frequent last names. The normalizer 221 mayfind a match for the entry “Smith” in the frequent family names field,and for “John” in the frequent given names field in the referencedatabase 270. The normalizer 221 may determine that the name or wordorder of the entry should be re-arranged to read “John Smith.”

[0040] In addition, based on a rule set for the normalizer 221 containedin the rule set database 211, the abbreviation “M.D.” may be changed orexpanded to “Doctor.” Accordingly, the normalizer 221 may modify theentry “Smith, John M.D.” to “Doctor John Smith.”

[0041] In embodiments of the present invention, after the entry has beenmodified, the pre-processor 220 may determine, based on the rules usedto modify the entry from rules database 211, a confidence level for thecorresponding pre-processed entry. The determined confidence level maybe compared to a pre-determined threshold that may be set for one ormore entries. It is recognized separate threshold levels can be set fora particular entry or particular types of entries. For example, entriesin the “Names” may have a one threshold and entries in the “Address”field may have another threshold. If a pre-processed entry has acorresponding confidence level above the corresponding threshold (alsoreferred to herein as being processed with a high level of confidence),the modified entry may be stored in the confirmed pre-processed listingsdatabase 250 and/or may be forwarded directly to the automated attendant190.

[0042] In embodiments of the invention, the confidence levels can bedetermined dynamically, based upon the rules and degree of correlationwith the reference database 270. For example, the entry “John MichaelM.D” may be converted to “Doctor Michael John” with low confidencebecause both “John” and “Michael” are listed as frequent given names inthe reference database 270. The entry “Smith John J. MD” may beconverted to “Doctor John J. Smith” with a high confidence level, since“John” is a likely given name and “Smith” is a likely family nameaccording to the reference database 270. Additionally, this entry mayhave a high confidence level based on a rule that, for example, saysthat a middle initial is likely to follow a given name, as opposed tofamily name.

[0043] In alternative embodiments of the present invention, if apre-processed entry has a corresponding confidence level below thecorresponding threshold (also referred to herein as being processed witha low level of confidence), the modified entry may be forwarded to, forexample, the non-confirmed listings database 240. The non-confirmedlistings database 240 may be accessed by an operator using an operatorinterface 180. The operator may check the entry to determine if theentry is correct or may modify the entry in accordance with embodimentsof the present invention (to be described below in more detail).

[0044] In embodiments of the present invention, street name expander 223may receive and pre-process entries in the “Address” field of thelistings database 110 based on corresponding rules in the rules database211. The street name expander 223 may identify entries in the database110 that do not match or correlate with the corresponding entries in thereference database 270. For example, the entries located in the addressfield may include street names that may include abbreviations that mayneed to be expanded, and/or typographical errors and/or misspellingsthat need to be corrected. The street name expander 223 may receive allof the entries in the address field from database 110 and correlates thestreet name in each entry of database 110 to street name entries locatedin the reference database 270 to correct any deviations in the database110.

[0045] According to the rule set in the rules database 211, the streetname expander 223 may correlate only entries with respect to a township,city, etc. in which the street address in located. In alternativeembodiments of the present invention, the street name expander 223 maycorrelate all of the entries in the database 110 with correspondingentries in reference database 270. The street name expander 223 maycompare street address entries in the listings database 110 withcorresponding field entries in the reference database 270.

[0046] If the expander 223 identifies entries in database 110 that donot correlate with corresponding entries in the reference database 270,the expander 223 may, based on the corresponding rules 211, modify suchentries as needed. If a close match between a corresponding entry of thedatabase 110 and reference database 270 is found, the street name in thedatabase 110 may be modified. For example, the entry “Yale Dr.” may bemodified to “Yale Drive” based on a match found in the referencedatabase 270. Additionally, street name expander 223 may modify theentry to correct other errors that may be included in the entry.

[0047] If the modification is performed with a high level of confidence,the modified entry may be sent to the confirmed pre-processed listingsdatabase 250 for storage and/or sent to the automated attendant 190.Alternatively, if the modification is performed with a low level ofconfidence, the modified entry may be forwarded to the non-confirmedlistings database 240 for operator confirmation and/or modification asdescribed herein.

[0048] In embodiments of the present invention, township corrector 223may receive and pre-process entries in the “Township” field of thelistings database 110 based on corresponding rules in the rules database211. As used herein, the term, township may refer to the community,town, the city, state, etc. of interest. In embodiments of the presentinvention, township corrector 225 may correlate entries in the townshipfield of white pages listings database 110 with corresponding entries inthe reference database 270.

[0049] In embodiments of the present invention, the township corrector225 may employ corresponding rules from rules database 211 topre-process the township entries. The township corrector 225 mayidentify entries in the database 110 and that do not match or correlatewith the corresponding entries in the reference database 270. Forexample, based on the rules, the township corrector 225 may correlatethe township entries in database 110 with corresponding entries in thereference database 270 to expand abbreviations, and/or to correcttypographical errors and/or misspellings, or to remove extraneousinformation included in the township entry. For example, the townshipcorrector 225 may remove extraneous information, for example, words suchas township, city, etc. after a valid name, and/or hyphens or otherpunctuation that does not appear in the corresponding township entriesin the reference database 270.

[0050] In embodiments of the present invention, the township corrector225 may use, for example, a zip code entry to correlate township name inthe database 110 with corresponding entries in the reference database270.

[0051] If the township corrector 225 identifies entries in database 110that do not correlate with corresponding entries in the referencedatabase 270, the township corrector 225 may, based on the correspondingrules 211, modify such entries as needed. If the modification isperformed with a high level of confidence, the modified entry may besent to the confirmed pre-processed listings database 250 for storageand/or sent to the automated attendant 190. Alternatively, if themodification is performed with a low level of confidence, the modifiedentry may be forwarded to the non-confirmed listings database 240 foroperator confirmation and/or modification as described herein.

[0052] It is recognized that spelling and/or punctuation/grammar errorsmay be corrected as the components of the pre-processor 220 process theentries of database 110 as described above. Alternatively, thepreprocessor 220 may also include a separate spelling checker and/orgrammar checker (not shown) to correct spelling and/or grammar errors inthe entries.

[0053]FIG. 3 is a block diagram illustrating the use of an operatorinterface 180 in accordance with an embodiment of the present invention.The operator interface 180 may be a GUI used by an operator to confirmand/or modify entries pre-processed by pre-processor 220 with a lowconfidence level. Additionally, the operator interface 180 may be usedto edit and/or add rules to the rules database 211.

[0054] In embodiments of the present invention, if the pre-processor 220determines, based on the rules in database 211, that an entry indatabase 110 was modified or pre-processed with a low confidence level,the entry is forwarded to the non-confirmed listings database 240, asshown in FIG. 3. In embodiments of the present invention, usinginterface 180 an operator may access the non-confirmed entries residingin database 240 and determine whether the modifications are correct. Ifthe low confidence modifications are determined to be correct by theoperator, the modified entries may be sent to the confirmedpre-processing listings database 250 for storage and/or to the automatedattendant 190.

[0055] Alternatively, in embodiments of the present invention, if theoperator determines that one or more entries in the non-confirmedlistings database 240 are not correct, the operator using operatorinterface 180 may be presented with a plurality of suggested correctionsthat had been generated by the system using the rules in rules database211, that may be used to modify the entry. Using the input interface300, the operator may select one of the choices presented by the GUI180. The operator's choice may be captured by the GUI 180 and thepre-processor may pre-process the entry in accordance with the selectedcorrection. Alternatively, the operator may modify the entry at theoperator's discretion. The modified entry may be sent to the confirmedpre-processing listings database 250 for storage and/or to the automatedattendant 190.

[0056] In alternative embodiments of the present invention, the operatormay use the GUI 180 to compile a new rule set and/or modify an existingrule set. The newly compiled rule set may be captured by the GUI 180 andthe pre-processor may pre-process the entry in accordance with newlycompiled rule set. If a new rule is compiled, the operator may alsochoose the scope of application for the new rule. In other words, theGUI 180 may present the operator with selections relating to the scopeof the new or modified rules. In other words, the operator may selecthow the newly compiled rules should be applied. The operator may selectthat the newly compiled rule should be applied globally, for the currentcase only, for future cases, for previous cases, for all names, for allstates, for all townships and/or any other case desirable. Using theinput interface 300, the operator may select one of the choicespresented by the GUI 180. The operator's choice may be captured by theGUI 180 and the pre-processor may apply the rule in accordance with theoperator's selection.

[0057]FIG. 4 is a flowchart illustrating a listings pre-processingmethod in accordance with an exemplary embodiment of the presentinvention. As shown in step 4010, a pre-processor 220 of listingspre-processing device 120 receives a first directory listings thatincludes one or more fields. For example, the first directory listingmay be a white pages listings from database 110. The one or more fieldsincluded in the first directory listings may contain one or more entriesand the entries may contain one or more symbol strings. Thepre-processor receives a second directory listing that also includes oneor more fields, as shown in step 4020. The second directory listing maybe, for example, a reference database 270. The one or fields included inthe second directory listings may contain one or more entries and theentries may contain one or more symbol strings

[0058] After the pre-processor 220 receives the first and seconddirectory listings, the pre-processor 220 correlates entries in the oneor more fields of the first directory listings with entries in thecorresponding one or more fields of the second directory listings, asshown in step 4030. As shown in step 4040, the pre-processor 220identifies entries, in the one or more fields of the first directorylistings, which do not correlate with entries in the corresponding oneor more fields of the second directory listings. The identified entriesare processed using a rule set corresponding to the field in which theentry is located, as shown in step 4050. The pre-processor 220, based onthe corresponding rule set, determines a corresponding confidence levelfor the processed entries, as shown in step 4055.

[0059] In embodiments of the present invention, if the identifiedentries have a corresponding confidence level exceeding or meeting athreshold, then the processed entries are automatically modified, asshown in steps 4060-4070. In that case, the modified entries are outputfor processing, as shown in step 4080. For example, the modified entriesmay be output to a confirmed pre-processed listings database 250 and/orto an automated attendant 190.

[0060] If in step 4060 the identified entries have a correspondingconfidence level below threshold, the processed entries are marked foroperator confirmation, as shown in step 4090. The marked entries arepresented to the operator for confirmation and/or further modification,as shown in step 4100.

[0061] In embodiments of the present invention, the operator may use aGUI interface to check the entries. The operator may modify the entriesusing existing rules or the operator may modify the entry using newrules. In embodiments of the present invention, the operator may edit orupdate a rule and/or may add a new rule to the rules database 211. Ifthe operator edits an existing rule and/or adds a new rule, previouslymodified entries may the processed using the updated rule and/or the newrule. Once the entries are modified by operator intervention, and/or amodified or new rule set, the modified entries are output forprocessing, as shown in step 4080. As indicated above, the modifiedentries may be output to a confirmed pre-processed listings database 250and/or to an automated attendant 190.

[0062] Several embodiments of the present invention are specificallyillustrated and/or described herein. However, it will be appreciatedthat modifications and variations of the present invention are coveredby the above teachings and within the purview of the appended claimswithout departing from the spirit and intended scope of the invention.

What is claimed is:
 1. A method for pre-processing entries in adirectory listings, comprising: receiving a first directory listingsincluding one or more fields, the one or more fields populated withentries including one or more symbol strings; receiving a seconddirectory listings including one or more fields, the one or more fieldsof the second directory listings populated with entries including one ormore symbol strings; correlating entries in the one or more fields ofthe first directory listings with entries in the corresponding one ormore fields of the second directory listings; identifying entries, inthe one or more fields of the first directory listings, which do notcorrelate with entries in the corresponding one or more fields of thesecond directory listings; processing the identified entries using arule set corresponding to the field in which the entry is located; basedon the rule set, determining a corresponding confidence level for theprocessed entries; automatically modifying the processed entries havingthe corresponding confidence level meeting or exceeding a threshold; andoutputting the automatically modified entries for processing.
 2. Themethod of claim 0, further comprising: marking the processed entrieshaving the corresponding confidence level below the threshold foroperator confirmation.
 3. The method of claim 2, further comprising:presenting at least one of the marked entries to an operator using agraphical user interface; presenting one or more rules from the rulesset, corresponding to the field in which the at least on of the markedentries is located, to the operator using the graphical user interface;receiving an operator's input selecting at least one of the one or morerules; and processing the at least one of the marked entries inaccordance with the operator's selection.
 4. The method of claim 3,further comprising: outputting the at least one of the marked entriesprocessed in accordance with the operator's selection to an automatedattendant.
 5. The method of claim 3, further comprising: outputting theat least one of the marked entries processed in accordance withoperator's selection to a pre-processed listings database.
 6. The methodof claim 2, further comprising: presenting at least one of the markedentries to an operator using a graphical user interface; receiving anoperator's inputs to manually modify the at least one of the markedentries; and modifying the at least one of the marked entries inaccordance with the manual inputs from the operator.
 7. The method ofclaim 2, further comprising: presenting one or more rules from the ruleset, corresponding to the field in which the at least one of the markedentries is located, to the operator using the graphical user interface;receiving an operator's input modifying the at least one of the one ormore rules; and processing the at least one of the marked entries inaccordance with the modified rule.
 8. The method of claim 0, wherein theprocessing step comprises: selecting at least one of the identifiedentries; based on the correlation with corresponding entries in thesecond database, determining whether the selected entry from the firstdatabase includes inverted symbol strings; and if the selected entry isdetermined to include the inverted symbol strings, correcting theinversion in the selected entry.
 9. The method of claim 0, wherein theprocessing step comprises: selecting at least one of the identifiedentries; based on the correlation with corresponding entries in thesecond database, determining whether the selected entry from the firstdatabase includes an abbreviation; and if the selected entry isdetermined to include the abbreviation, expanding the abbreviation basedon a closest correlation for the selected entry found in the seconddatabase.
 10. The method of claim 0, wherein the processing stepcomprises: selecting at least one of the identified entries; based onthe correlation with corresponding entries in the second database,determining whether the selected entry from the first database includesextraneous information; and if the selected entry is determined toinclude extraneous information, removing the extraneous informationbased on a correlation for the selected entry found in the seconddatabase.
 11. The method of claim 0, wherein the second database is anofficial postal office database.
 12. Apparatus for pre-processingentries in a directory listings database comprising: a referencedatabase configured to store one or more fields, the one or more fieldspopulated with entries including one or more symbol strings; a rulesdatabase configured to store one or more rule sets; and a processorconfigured to: correlate entries contained in the directory listingsdatabase with entries in the corresponding one or more fields of thereference database, identify entries in the directory listings databasewhich do not correlate with corresponding entries in the referencedatabase, process the identified entries using the one or more rule setsfrom the rules database, based on the one or more rule sets, calculate acorresponding confidence level for the processed entries, andautomatically modify the processed entries having the correspondingconfidence level meeting or exceeding a threshold.
 13. The apparatus ofclaim 12, wherein the processor to further output the automaticallymodified entries for processing.
 14. The apparatus of claim 12, whereinthe processor is configured with a word order normalizer that correctsword order of entries contained in the directory listings database. 15.The apparatus of claim 12, wherein the processor is configured with astreet name expander that expands abbreviations of entries contained inthe directory listings database.
 16. The apparatus of claim 12, whereinthe processor is configured with a township corrector that removesextraneous information from entries contained in the directory listingsdatabase.
 17. The apparatus of claim 12, further comprising: a confirmedlistings database configured to store the automatically modified entrieshaving the corresponding confidence level meeting or exceeding thethreshold.
 18. The apparatus of claim 12, further comprising: anon-confirmed listings database configured to store entries that havethe corresponding confidence level below the threshold.
 19. Amachine-readable medium having stored thereon a plurality of executableinstructions, the plurality of instructions comprising instructions to:receive a first directory listings including one or more fields, the oneor more fields populated with entries including one or more symbolstrings; receive a second directory listings including one or morefields, the one or more fields of the second directory listingspopulated with entries including one or symbol strings; correlateentries in the one or more fields of the first directory listings withentries in the corresponding one or more fields of the second directorylistings; identify entries, in the one or more fields of the firstdirectory listings, which do not correlate with entries in thecorresponding one or more fields of the second directory listings;process the identified entries using a rule set corresponding to thefield in which the entry is located; based on the rule set, determine acorresponding confidence level for the processed entries; automaticallymodify the processed entries having the corresponding confidence levelmeeting or exceeding a threshold; and output the automatically modifiedentries for processing.
 20. The machine-readable medium of claim 19having stored thereon additional executable instructions, the additionalinstructions comprising instructions to: mark the processed entrieshaving the corresponding confidence level below the threshold foroperator confirmation.